1 Time Series 7-Day Forecasting with SARIMAX individual countries

1.1 Contents

1.2 Imports

1.3 Functions

1.3.1 Functions: feature_list

1.3.2 Functions: split_sequences

1.3.3 Functions: plot_actual_predicted

1.3.4 Functions: split_sequence_features

1.3.5 Functions: rmse_y_y_pred

1.3.6 Functions: rmse_y_y_pred_country

1.3.7 Functions: simplify_cats

1.4 Load data

1.5 Preprocessing

1.5.1 Preprocessing: get feature and target for model training and testing with cross-validation

1.5.2 Preprocessing: scale feature and target variables

1.5.3 Preprocessing: get feature names

1.6 SARIMAX model EDA

A classic approach in timeseries forecasting provides a parsimonious desciption of a (weakly) stationary stochastic process in terms of polynomials, one for the autoregression (AR) and the second for the moving average (MA), thus the name 'ARMA' model. ARMA has been extended to better incorporate seasonality (S) in the timeseries and to include exogenous (X) variables to leverage the prediction power of the model. SARIMAX model provides a framework where a timeseries can be forcasted based on 1) autoregressive, 2) moving-average, 3) seasonal, 4) exogenous components. Thus, here we use SARIMAX model to forecast COVID19 cases for days and weeks into the future. Here, we perform an EDA to get an insight for the range of key parameters that would be used in the automatic grid search of best parameters in the next session.

There is a clear seasonality with period=7 observed reflecting the weekly fluctuations, suggesting that SARIMA would be the better model than ARIMA.

Try grid searching to estimate parameters for SARIMA

The above plot shows that the SARIMAX fits the actual case data pretty well.

The model summary shows that autoregressive (AR) and moving average (MA) components of the model appeared to be highly significant as expected. As well some of the exogenous variables like 'vaccination', 'mobility in workplace', and temporal categorical (year, month, dayofm) were significant features for the model. Now let's go ahead and try forecasting with SARIMAX model for all countries.

1.6 SARIMAX model

1.6.1 Organize and evaluate model performance: train set

1.6.2 Organize and evaluate model performance: validation set

1.6.3 Organize and evaluate model performance: test set

1.6.4 Train and test the SARIMAX model

1.7 Save and plot model performance

1.7.1 Save data

1.7.2 Plot model performance